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ON MINIMUM SPANNING TREE-LIKE METRIC SPACES 


MOMOKO HAYAMIZU* *t AND KENJI FUKUMIZUt* 


Abstract. We attempt to shed new light on the notion of ‘tree-like’ metric 
spaces by focusing on an approach that does not use the four-point condition. 

Our key question is: Given metric space M on n points, when does a fully 
labelled positive-weighted tree T exist on the same n vertices that precisely 
realises M using its shortest path metric? We prove that if a spanning tree 
representation, T, of M exists, then it is isomorphic to the unique minimum 
spanning tree in the weighted complete graph associated with M, and we 
introduce a fourth-point condition that is necessary and sufficient to ensure 
the existence of T whenever each distance in M is unique. In other words, a 
finite median graph, in which each geodesic distance is distinct, is simply a tree. 

Provided that the tie-breaking assumption holds, the fourth-point condition 
serves as a criterion for measuring the goodness-of-fit of the minimum spanning 
tree to M, i.e., the spanning tree-likeness of M. It is also possible to evaluate 
the spanning path-likeness of M. These quantities can be measured in 0(71“^) 
and 0{n^) time, respectively. 

1. Introduction 

Historically, graphs as finite metric spaces have been extensively studied [5]. 
Even though we approach them differently, we would like to emphasise, amongst 
others [IS [13 [H [H], the classical result provided by Buneman [B]- In short, a 
metric on a finite set can be realised by the shortest path metric in a positive- 
weighted tree if and only if it satisfies the four-point condition. Not only is it 
frequently quoted in the context of evolutionary trees [14], but it is also known 
for its direct connection to the theory of Gromov hyperbolic metric spaces [8]. 
Approximately two decades later after Buneman’s theorem. Bendy nni proved the 
existence of a unique tree representation for every metric satisfying the four-point 
condition. 

Given this background, a metric space that satisfies the four-point condition is 
commonly considered tree-like. However, an important caveat should be addressed: 
the four-point condition is necessary and sufficient to ensure the existence of a 
partially labelled tree that realises a given metric [5iiini[ii]. For example, a complete 
graph with a uniform edge length clearly satisfies the four-point condition, but it 
only becomes tree-like after an extra vertex is added. In this case, the four-point 
condition does not ensure that a metric is realised by a fully labelled tree on the 
same set. It does not characterise the distance within trees, in general, but rather 
the shortest path metrics induced by graphs of a certain class, called block graphs 
{i.e., graphs in which all biconnected components are complete subgraphs) [2]- 

2010 Mathematics Subject Classification. Primary 05C12; Secondary 05C05, 05C38. 

Key words and phrases, minimum spanning tree, tree metric, median graph. 

* Department of Statistical Science, The Graduate University of Advanced Studies. 

f The Institute of Statistical Mathematics. 


1 



This may not create an issue in the field of conventional phylogenetics, but 
considering the recent surge of renewed biological interest in minimum spanning 
tree (MST)-based tree estimation |13] . determining when a metric space is realised 
by a positive-weighted tree on the same set is not only a natural undertaking but 
also a meaningful one. Thus far, this problem has not been properly recognised, 
much less addressed. The only two exceptions to this are the recent work provided 
in [I] and in [9] . It seems to be a non-trivial question not only because it cannot be 
answered using Buneman’s theorem, but also because it is equivalent to determining 
a method for recognising a special case of the metric travelling salesman problem 
(TSP). If an input—a metric on a set of cities—is the shortest path metric in a tree 
on the city set, the length of the optimal tour must equal twice the length of the 
MST. 

In this paper, we examine the sub-type of tree metrics without relying on the 
four-point condition. Our work is based on three ingredients: the so-called tie¬ 
breaking assumption, which has been popular in algorithmic applications since the 
work provided by Kruskal in |12] : what we call the fourth-point condition, which 
can typically be found in the definition of median metric spaces [7]; and a simple 
trick for metric-preserving edge removal, which applies to any finite metric space. 
These concepts, which are part of our original results, are defined and discussed in 
Section [5J 

As expected, if it exists, a fully labelled positive-weighted tree that realises a 
finite metric space is the unique MST in its associated weighted complete graph 
fProDOsition 12.131) . Our goal is to prove the following: A finite metric space under 
the tie-breaking rule is realised by the MST if and only if it satisfies the fourth point 
condition (Theorem 13.II) . This implies that every finite median graph, in which the 
shortest path lengths between all pairs of vertices are distinct, is necessarily a tree 
(Corollarv l3.3p . This result also yields a stronger condition for understanding when 
a finite metric space is realised, especially by a spanning path graph (Corollarv l3.5l) . 
We define and discuss the notion of a spanning tree-likeness of a finite metric space 
in Section m 


2. Preliminaries 

We apply the metric-related terminology provided in [7] throughout this paper. 
Let (X, (Im) be a finite metric space, that is, a finite set, X, equipped with metric 
cLm- For two distinct points x and x' in X, the closed metric interval between them 
is defined to be the set 

I{x, x') := {i G X : dM{x, x') = dM^x, i) + dM^i, 

All graphs considered in this paper will be simple, undirected, fully labelled (i.e., 
each vertex is labelled), and positive weighted {i.e., each edge has a positive length). 
A graph is denoted {V, E; w) for a set, V, of labelled vertices and a set, E, of edges 
that are associated with a positive edge-weighting function, w : E M+. Given 
graph G, the sets of vertices and edges are denoted V{G) and E{G), respectively. 
Moreover, graph G is said to be a graph on V{G). Vertices may be renamed as 
needed, assuming no confusion arises, and a vertex labelled ‘cc’ is referred to as 
vertex x. The distance in graph G is defined to be the shortest path metric and is 
represented using dc- 
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Assume M is a finite metric space, {X, du)- Let Km be the associated weighted 
complete graph with M. An edge of Km that joins two distinct vertices, x and x', is 
denoted e(x,x'). This paper uses the terms ''points’ and ‘vertices’ interchangeably 
because there is a one-to-one correspondence between X and V{Km) for any finite 
metric space M. 

2.1. Tie-breaking rule. 

Definition 2.1. A finite metric space, {X,dM), is said to satisfy the tie-breaking 
rule if the values of dM are distinct for all pairs in X. 

2.2. The fourth point condition. 

Definition 2.2 (Figure [1]). A finite metric space, {X,dM), is said to satisfy the 
fourth-point condition if, for every (not necessarily distinct) three points x,y, z € X, 
there exists a point, p* € X, such that 

dM{x,p*) -\- dM{y,P*) + dM{z,P*) = ^{dM{x,y) dM{yx) + dM{z,x)}. 



Figure 1. Fourth point p* for triplet {x,y,z} 


Proposition 2.3. If a finite metric space, {X,dM), satisfies the fourth-point con¬ 
dition, fourth point p* G X is unique for each triplet in X. 

Proof. Suppose that there are two quartets, {x, y, z,pl} and {x, y, z,P 2 } {Pi ^ P 2 ), 
in X such that 

dM{x,p*i) -f dM{y,P*i) + dM{z,pl) = dMix,P2) + dM{y,pl) + dM{z,p*2). 
Because dM is a metric on Al, we have 

dM{x,P2) < dM{x,pl) -P dM{p*i,P*2)', 
dM{y,P*2) < dM{y,Pl) + dM{p*i,P*2)', 
dM{z,P* 2 ) < dM{z,P*i) + dM{p*i,P2)- 

Therefore, p* = P 2 , but this is a contradiction. Hence, if p* exists for {x, y, z}, it is 
unique. □ 

Proposition 2.4. The following is equivalent to saying that finite metric space 
{X,dM) satisfies the fourth-point condition: For every (not necessarily distinct) 
three points x,y,z € X, there exists only one point p* € I(x, y) fl I{y, z) fl I{z, x). 

Proof. Because dM is a metric, for all x, y,z,p & X, we have dM{x,p) -\- dM{y,p) + 
dM{z,p) > ^{dM{x,y) -I- dM{y,z) -\- dM{,z,xy\. The equality holds if and only if 
/(x, y) n /(p, z) n I{z, x) = {p*}. Proposition 12.31 ensures the uniqueness of p*. □ 
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Remark 2.5. Fourth point p* is also known as the median for {a:, y, z} because it 
minimises the sum of the distances to the three points, and a metric space satisfying 
the fourth-point condition (or a graph inducing this kind of metric space) is said 
to be median [1I2]. Although a discussion of this topic is provided in [213], it 
should be noted that median graphs include multiple types of graphs other than 
trees, such as grid and square graphs. 

Lemma 2.6. Let C he a cycle graph, {V,E;w), with ~ let dc 

be the shortest path metric in C. Given three distinct points x,y,z G V such that 
dc(x,y) + dc{y,z) + dc{z,x) = c, the fourth point, p*, exists in V if and only if 
max{dcix,y),dc{y,z),dc{z,x)} = c/2. 

Proof. Without loss of generality, we can assume dc{z, x) = max{(ic(x, y), dc{y, z),dc{z, x)}. 
Clearly, y G /(x, y) Cl /(y, z). Therefore, /(x, y) fl J(y, z) Cl I{z, x) = 0 if and only 
if y ^ I{z,x). Under the assumption that the length of C is fixed at c, this is 
equivalent to stating that dc[z, x) c/2. Thus, /(x, y)ril{y, z) n/(z, x) = 0 if and 
only if dc{z,x) ^ c/2. Applying Proposition 12.41 completes the proof. □ 

2.3. Basic geodesic graphs. In this subsection, we present a simple trick for 
metric-preserving edge removal, which can be used to represent an arbitrary finite 
metric space as a graph with the fewest edges. Let M be a finite metric space, 

{X,dM), and assume Km is the weighted complete graph associated with M. 

Definition 2.7. Suppose G is a connected graph on finite set X with shortest path 
metric do- Graph G is said to realise M if dG{x,x') = dM{x,x') for all x,x' G X. 

Definition 2.8. Given x,x' G X, the edge, e(x,x'), of Km is said to be non-basic 
if there is a permutation, (xi, X 2 , • • • , Xfc), on a non-empty subset of A \ {x,x'} 
such that cyclic permutation (x, Xi, X 2 , • • • , Xfc, x') satishes 

dM{x,x') = dM{x,Xi) + dM{Xl,X 2 ) + ... + dM{Xk, x'). 

The edge is called basic otherwise. 

Proposition 2.9. Let x,y,z be three different vertices of Km- When the three 
edges, e(x,y), e{y,z), and e{z,x), of Km are basic, the fourth point, p*, does not 
exist for {x,y,z}. If a non-basic edge exists, say e(x,y), points x and y are the 
only two candidates for p*. 

The proof of this proposition is straightforward. 

Definition 2.10. Assume Bm is the set of all basic edges of Km, and suppose A 
is a restriction of dM to Bm- A subgraph, Gm '■= (A, Um; A), in Km is called the 
basic geodesic graph in Km- 

Lemma 2.11. The basic geodesic graph, Gm, in Km is a connected graph on X 
that realises M. 

Proof. It suffices to prove that Gm is connected. Assuming that e(x,x') is non- 
basic, we show that there is a path of basic edges joining x and x' in Km- We also 
note that they are obviously connected in Gm if e(x,x') G E{Km) is basic. Let 
G be a cycle with the greatest number of vertices (or edges) of all cycles in Km 
that share edge e(x,x') and overall length 2dM{x,x'). Let U(G) = {x,x'} U Y, 
where Y := {xi, • • • ,Xk} is a non-empty subset of A \ {x,x'}, as in Definition 12.81 
Furthermore, suppose dc is the shortest path metric induced by G and Xi,Xj G 
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V{C). If a path existed in Km joining Xi and xj that was shorter than dc{xi^Xj), 
then edge e{x,x') would be longer than the path connecting x and x' through Xi 
and Xj. Therefore, any path in Km joining two vertices in V(C) must have a length 
greater than or equal to dc{xi,Xj). We use this fact at the end of the proof. 

In order to obtain a contradiction, we suppose e{y,y') € E{C) \ e(x,x') is non- 
basic. We define C' to be a cycle in Km of overall length 2dM{y, y') with e{y, y') S 
E{C'), which is similar to our previous case except that |y(C")| is unimportant. 
Let V{C') = {y,y'} U Z, where Z := {j/i, • • • , 2 /;} C X \ {y,y'}. By Definition [lUl 
if a cycle contains a non-basic edge, then it must be strictly longer than the other 
edges in the cycle. This implies that the number of non-basic edges contained in 
each cycle is zero or one. Thus, e(y,y') is shorter than e{x,x'), and e(y,y') is the 
longest edge in E{C'). Therefore, we can conclude that e{x,x') is not in E{C'). 
The assumption on |F(C')| provides F fl Z ^ 0. Our hypothesis ensures a path in 
Km of length dc{y, y') that connects y and y' via y" S FOZ. This implies that Km 
contains a path joining y and y" of length less than dciy, y')- If we assume that y' 
lies in the shortest path joining y and y" in C (note that the roles of y and y' can 
be exchanged), then we have dc{y,y') < dc{y,y"). It follows that there is a path 
that joins y and y" in Km of length less than dc{y,y"). This is a contradiction. 
Hence, e{y,y') is basic, which completes the proof. □ 

Definition 2.12. Finite metric space M is said to be a spanning tree metric space 
if the basic geodesic graph, Gm, in the weighted complete graph, Km, is a spanning 
subtree in Km- In particular, M is said to be a spanning path metric space if Gm 
is a path graph (ie., a tree with two vertices of degree one and remaining vertices 
of degree two) that spans all the vertices of Km- 

Proposition 2.13. Let M := (X,dM) be a spanning tree metric space and Gm be 
the basic geodesic graph in Km- Then the following statements hold: 

(1) Gm Is the unique minimum spanning tree in Km; 

(2) Gm is the unique fully labelled tree on X that realises M. 

Proof. (1) Assume B := E{Gm) and let B := E{Km) \ B. Because \B\ = |A| — 1, 
Gm is the only spanning tree in Km such that all edges are basic. In addition, 
let e{x,x') G B. Because Gm is a tree, there is a unique path joining x and x', 
denoted P. Each edge of P must be strictly shorter than dM{x, x') for the following 
reasons: the length of P equals dM{x,x'); the number of edges of P exceeds one; 
and the edge weights are all positive. Therefore, replacing an arbitrary edge of P 
with e{x, x') results in a spanning tree in Km of greater length. Hence, Gm is 
shorter than any other spanning trees in Km- (2) Suppose that M is realised by 
fully labelled tree T on X. This implies that each edge of T has a positive weight. 
We can recover Km from T by summing the weights along every path in T that 
has two or more edges. This process indicates that T is isomorphic to the basic 
geodesic graph in Km. Hence, given (1), we know T is unique. □ 

Remark 2.14. Proposition 12.131 states that a metric space is uniquely realised by 
the only MST if it is a spanning tree metric space. Note that we do not need Bune- 
man’s four-point condition in the argument (c/. [1]). Concerning the uniqueness 
of the MST, the tie-breaking rule is a well-known sufficient condition established 
by Boruvka [4] (cited in [12]) and by Kruskal [12]. The next section explores its 
relation to spanning tree metric spaces. 
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3. Main results 


Theorem 3.1. Let M he a finite metric space, {X, cIm), under the tie-breaking rule. 
Then M is a spanning tree metric space if and only if it satisfies the fourth-point 
condition. 

Proof, (i) The fourth-point condition clearly holds for all spanning tree metric 
spaces, (ii) If dM is not a spanning tree metric on X, then we will show that there 
is a triplet in X that violates the fourth-point condition. According to Lemma 
12.111 our assumption implies that the basic geodesic graph, Gm = iX,B;X), in 
Km contains at least one cycle. Suppose C := (X^, Bj.; Xi^) is the shortest cycle in 
Gm, where X^ Q X, B^ C B, \X]f\ = \B}f\ = k, and Xk is the restriction of A to B^. 
Then ProDOsition l2.9l vields /c > 4. Let c denote the sum of the Xk over all elements 
in Bk. Also, assume that dc is the shortest path metric in G. For all i,j € X^, no 
path in Gm joining i and j has a shorter length than dc{i,j) (otherwise, G would 
not be the shortest cycle in Gm). Therefore, dc{i,j) = minjay, c — a^}, in which 
Qij represents the length of the path in C that travels from i to j in a clockwise 
direction. 

Consider a route in which we visit the points in X^. Let s G Xk be the starting 
point from which we travel along the circle in a clockwise direction. We assign a 
label, ‘L’ or ‘R’, to every point i S Afe\{s}: label ‘L’ is assigned if Ogi < c/2, and we 
use label ‘R’ if Ogi > c/2. If every point in Afe\{s} was labelled ‘L’, the last edge we 
would traverse returning to s would be non-geodesic or non-basic. Therefore, there 
exists one and only one basic edge between vertices labelled ‘L’ and ‘R’. Suppose 
that t signifies the last point with label ‘L’ and u indicates the first point with label 
‘R’ as on the left in Figure[2l Note that dM{s, f) dM{t, u) -\- dM{u, s) = c. 

We assume that p* exists for {s, t, m} (otherwise, the assertion of the theorem 
immediately follows). Lemma ITHI gives us m.ax.{dM{s, t),dM{t, u), dM{u, s)} = c/2. 
Thus, dM{u,s) = cl2 (the edge joining t and u is basic, and dM(,s,t) < c/2). Let 
vfifi u) be a point in Xk with label ‘R’ that is between u and s as on the right 
in Figure m We know point v exists because e(u, s) would be non-basic otherwise. 
According to the tie-breaking rule, we note that atv ^ c — atv. We can also set 
atv < c — atv in order to select {s,t,z;}. Although we should select {t,u,v} when 
atv > c — atv, we limit our consideration to the former case. Therefore, we have 
dMis, t) -\- dM(t, v) + dM{v, s) = c again, but each of the three terms does not equal 
c/2 (recall that dM{s,u) = cjT). Hence, Lemma [2j6l implies that p* does not exist 
for {s,t,?;}, and this completes the proof. □ 



s s 


Figure 2 . Points in the proof of Theorem 13.11 
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Remark 3.2. Given a finite metric space on X, we can determine in 0(|X|^) time 
whether it is a spanning tree metric space. 

Corollary 3.3. Let G be a median graph on finite set X and let da be the shortest 
path metric of G. If each pair in X has a different value for da, then G is a tree. 

Remark 3.4. As was mentioned in Remark the fourth-point condition per se 
is not a sufficient condition, but it is a necessary condition in order to ensure that 
a finite metric space is induced by the shortest path metric in a tree (c/. a cycle 
graph on four vertices with a uniform edge length). 

Corollary 3.5. Suppose M := {X,dM) is a finite metric space under the tie¬ 
breaking rule. Then M is a spanning path metric space (Definition \2.12\) if and 
only if it satisfies the three-point condition; for every (not necessarily distinct) 
three points x,y,z G X, we have 

max{dM(a;, v), dniv, z), dM{z, a;)} = ^{dM{x, y) -b dmiy, z) + dniz, x)}. 

The condition can be confirmed in OdXp) time. If M is a spanning path metric 
space, it is realised by the unique shortest path that joins the farthest two points in 
X. 

Proof. We only prove the first statement. The three-point condition obviously 
holds for all spanning path metric spaces. Therefore, we assume that the three- 
point condition holds and show that the basic geodesic graph, Gm, in Km is a path 
graph on AT. It is clear that y is the fourth point, p*, for {x, y, z} when the left-hand 
side equals dM{z,x). This means that the fourth-point condition automatically 
holds for any finite metric space that satisfies the three-point condition. Therefore, 
our assumption implies that Gm is a tree on X. The three-point condition also 
indicates that every vertex in Gm has a degree of one or two. In other words, if 
vertex x has degree three or more, then any three distinct vertices adjacent to x 
would violate the three-point condition. Hence, Gm is a path graph on X, which 
completes the proof. □ 


4. Discussion 

The hyperbolicity of finite metric spaces (or graphs) is a concept provided by 
Gromov BM and measures the deviance of a metric space from Buneman’s four- 
point condition. If a metric space, M, satisfies the four-point condition, then the 
hyperbolicity of M equals 0, and M is said to be 0-hyperbolic. As was previously 
discussed, any complete graph with a uniform edge length is 0-hyperbolic. Because 
the four-point condition is a stronger version of the triangular inequality, all metric 
triangles are also 0-hyperbolic. Therefore, although the value of hyperbolicity is 
usually called the ‘tree-likeness’ of M, a more precise interpretation refers to the 
partially labelled tree-likeness of M. Therefore, as a final remark, we provide the 
notion of a fully labelled tree-likeness of M. 

Let us say that finite metric space M is p-roundabout. Here, p is defined to be 

dM(x,i) P dM{yfi) P du^zf) 1 

max min -— --- - — --- - — -- — -. 

x,y,zGXiex dM{x,y)dM\y,z)dM\z,x) 2 

This measures how far M deviates from the fourth-point condition. Provided that 
the tie-breaking rule holds, the value of p can be regarded as the spanning tree¬ 
likeness of M or the circuitousness of dM as illustrated in Figure El Note that p is 
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invariant under multiplication of (Im by a constant. As we have already seen, M is 
0-roundabout if and only if there is an exact fit between M and the MST. 



Figure 3. Illustrations of spanning tree-likeness (p = 0 and p > 0) 

The degree of violation of the three-point condition similarly provides the span¬ 
ning path-likeness of M —the maximum discrepancy between the left and right-hand 
sides of the triangular inequality. On the other hand, hyperbolicity does not provide 
any information because all metric triangles are 0-hyperbolic. 
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